Audio signal bandwidth extension

ABSTRACT

Described is a transmission system comprising a transmitter for transmitting a narrowband audio signal to a receiver via a transmission channel. The receiver comprises a bandwidth extender ( 18 ) for generating a wideband audio signal from the narrowband audio signal. The bandwidth extender ( 18 ) comprises spectral folding means ( 30 ) or generating a spectrally folded audio signal ( 33 ) by spectrally folding at least part of the narrowband audio signal. The transmission system according to the invention is characterized in that the bandwidth extender ( 18 ) comprises a noise shaper ( 32 ) for generating a shaped noise signal ( 35 ) by shaping a noise signal ( 31 ) in accordance with at least part of the spectrally folded audio signal ( 33 ), and in that the bandwidth extender ( 18 ) further comprises a combiner ( 34 ) for combining the shaped noise signal ( 35 ) and the spectrally folded audio signal ( 33 ) into the wideband audio signal. In this way metallic sounds which were introduced by the spectral folding are masked by combining the shaped noise signal ( 35 ) with the spectrally folded signal ( 33 ).

The invention relates to a bandwidth extender for generating a widebandaudio signal from a narrowband audio signal, and wherein the narrowbandaudio signal has a first bandwidth and the wideband audio signal has asecond bandwidth, wherein the second bandwidth is larger than the firstbandwidth, and wherein the bandwidth extender comprises an input forreceiving the narrowband audio signal and an output for supplying thewideband audio signal, wherein the bandwidth extender further comprisesspectral folding means coupled to the input, and wherein the spectralfolding means are arranged for generating a spectrally folded audiosignal by spectrally folding at least part of the narrowband audiosignal.

The invention further relates to a receiver for receiving a narrowbandaudio signal via a transmission channel, the receiver comprising abandwidth extender for generating a wideband audio signal from anarrowband audio signal, a method of receiving a narrowband audiosignal, and a method of generating a wideband audio signal from anarrowband audio signal.

The paper “Speech Enhancement Via Frequency Bandwidth Extension UsingLine Spectral Frequencies” by S. Chennoukh, A. Gerrits, G. Miet and R.Sluijter in the proceedings of the 2001 IEEE International Conference onAcoustics, Speech, and Signal Processing, Salt Lake City, Utah, May8-11, 2001, describes a bandwidth extender that may, for example, beused for audio signals, e.g. speech signals or music signals, receivedfrom a transmission medium such as a radio channel, a coaxial cable oran optical fiber. Other possible applications are automatic answeringmachines, dictating machines, (mobile) telephones, MP3 players andspoken books.

Narrowband speech, which is used in the existing telephone networks, hasa bandwidth of 3100 Hz (300-3400 Hz). Speech sounds more natural if thebandwidth is increased to around 7 kHz (50-7000 Hz). Speech with thisbandwidth is called wideband speech and has an additional low band(50-300 Hz) and high band (3400-7000 Hz). From the narrowband speechsignal, it is possible to generate a high band and a low band byextrapolation. The resulting speech signal is called a pseudo-widebandspeech signal. Several techniques for extending the bandwidth ofnarrowband signal are known, for example from the paper “A new techniquefor wideband enhancement of coded narrowband speech”, IEEE Speech CodingWorkshop 1999, Jun. 20-23, 1999, Porvoo, Finland. These techniques areused to improve the speech quality in a narrowband network, such as atelephone network, without changing the network. At the receiving side(e.g. a mobile phone or a telephone answering machine) the narrowbandspeech can be extended to pseudo-wideband speech.

FIG. 5 shows a block diagram of a bandwidth extender 18 as used in theknown transmission system. An input narrowband speech signal, which issampled at 8 kHz and which is supplied to the input 20 of the bandwidthextender 18, is first up-sampled by two (i.e. zeros are inserted betweensuccessive samples of the input narrowband speech signal) by anup-sampler 60. The obtained upsampled signal 61 is sampled at 16 kHz. Ithas the same spectrum in the lowband, i.e. 0-4 kHz, as the input signaland a folded version of it in the highband, i.e. 4-8 kHz. This signal 61is then low-pass filtered in a low-pass filter 62 to remove the foldedversion in order to recover the same spectral properties as the inputsignal but sampled at 16 kHz. The low-pass filtered signal 63 issupplied to an LPC analysis filter 70 and to a down-sampler 64.

The low-pass filtered signal 63 is down-sampled by two by the downsampler 64. Then, the resulting down-sampled signal 65 (which is sampledat 8 kHz) is modeled using an auto-regressive LPC model by means of aLPC analysis filter 66. The LPC analysis filter 66 derives LPCcoefficients 67 from the down-sampled signal 65. These LPC coefficients67 represent the spectrum of the input narrowband speech signal. Next,the narrowband LPC coefficients 67 are used by an envelope extender 68to extend the spectral envelope of the narrowband signal and to derivewideband LPC coefficients 69. This extension of the spectral envelope isperformed by mapping lowband line spectral frequencies (LSFs) towideband LSFs. This mapping is performed by means of a set of mappingmatrices.

Then, the output signal 63 of the low-pass filter 62 is analyzed usingan extended LPC analysis filter 70 on basis of the wideband LPCcoefficients 69. The analysis residual 71, that is expected to have aflat spectrum is thereafter successively down-sampled and up-sampled bytwo (i.e. put to zero every other sample) in spectral folding means 30.The successive down- and up-sampling realizes a spectral folding. Theresulting spectrally folded signal 73 is a sparse signal that is used toexcite a wideband synthesis filter 72 in order to obtain a widebandspeech signal that is supplied to the output 22 of the bandwidthextender 18. The wideband synthesis filter 72 operates on basis of thewideband LPC coefficients 69 and is the inverse of the analysis filter70.

It is a drawback of the known bandwidth extender 18 that the spectralfolding of the input narrowband signal introduces harmonic shifts (i.e.the harmonic components in the highband are not exactly located at thefrequencies where they should have been located), which harmonic shiftsresult in a crackling or metallic-like sound when reproduced. Theseharmonic shifts occur because the harmonic components of the high bandare only harmonic related to those of the narrow-band signal whenharmonic sampling is used (which in general will not be the case).

It is an object of the invention to provide a bandwidth extender thatdoes not suffer from this drawback. This object is achieved in that thebandwidth extender further comprises a noise shaper for generating ashaped noise signal by shaping a noise signal in accordance with atleast part of the spectrally folded audio signal, wherein the bandwidthextender further comprises a combiner for combining the shaped noisesignal and the spectrally folded audio signal into the wideband audiosignal. It has been found that the harmonic shifts in the spectrallyfolded audio signal can effectively be masked by combining thespectrally folded audio signal and the shaped noise signal. As theharmonic shifts are masked the undesired crackling/metallic-like soundis no longer present when the wideband audio signal is reproduced.

The shaped noise signal may be generated by shaping a (white) noisesignal in accordance with a property of (at least part of) thespectrally folded audio signal, e.g. in accordance with an amplitude ora phase of the spectrally folded audio signal. Preferably, the noisesignal is shaped in proportion to an envelope (e.g. a temporal envelope)of at least part of the spectrally folded audio signal. Listening testshave shown that the combination of such a shaped noise signal and thespectrally folded audio signal results in a very good quality widebandaudio signal.

Such a shaped noise signal that is shaped in proportion to an envelopeof the spectrally folded audio signal can advantageously be generated bya noise shaper that comprises an envelope extractor for extracting anenvelope signal from the spectrally folded audio signal, and wherein thenoise shaper further comprises a mixer for generating the shaped noisesignal by mixing the noise signal with the envelope signal.

The envelope extractor preferably comprises a Hilbert transformer, whichHilbert transformer may comprise a cascade of a Fourier transformer fortransforming a time domain representation of the spectrally foldedsignal into a frequency domain representation thereof, means for zeroingthe negative frequencies of the frequency domain representation, aninverse Fourier transformer for transforming the zeroed frequency domainrepresentation into a time domain representation thereof, and arectifier for generating the envelope signal by rectifying the zeroedtime domain representation.

The invention is defined by the independent claims. The dependent claimsdefine advantageous embodiments.

The above object and features of the present invention will be moreapparent from the following description of the preferred embodimentswith reference to the drawings, wherein:

FIG. 1 shows a block diagram of an embodiment of the transmission system10 comprising a receiver 14 having a bandwidth extender 18 according tothe invention,

FIG. 2 shows a block diagram of an embodiment of a bandwidth extender 18according to the invention,

FIG. 3 shows a block diagram of an embodiment of a noise shaper 32 foruse in the transmission system 10 according to the invention,

FIG. 4 shows a block diagram of an embodiment of an envelopeextractor/Hilbert transformer 40 for use in the transmission system 10according to the invention,

FIG. 5 shows a block diagram of a prior art bandwidth extender 18,

FIG. 6 shows a block diagram of a part 74 of the bandwidth extender 18of FIG. 5, which part 74 has been adapted in accordance with the presentinvention.

In the Figures, identical parts are provided with the same referencenumbers.

FIG. 1 shows a block diagram of an embodiment of the transmission system10 according to the invention. The transmission system 10 comprises atransmitter 12 for transmitting a narrowband audio signal, e.g. anarrowband speech signal or a narrowband music signal, to a receiver 14via a transmission channel 16. The transmission system 10 may be atelephone communication system wherein the transmitter may be a (mobile)telephone and wherein the receiver may be a (mobile) telephone or ananswering machine. The receiver 14 comprises a bandwidth extender 18 forgenerating a wideband audio signal from the narrowband audio signal. Thebandwidth of the narrowband audio signal is smaller than the bandwidthof the wideband audio signal. The bandwidth extender 18 comprises aninput 20 for receiving the narrowband audio signal and an output 22 forsupplying the wideband audio signal to additional signal processingparts (not shown) of the receiver 14, which additional signal processingparts may be arranged for amplification, reproduction and/or storage ofthe wideband audio signal.

FIG. 2 shows a block diagram of an embodiment of a bandwidth extender 18according to the invention, which bandwidth extender 18 may be used inthe transmission system 10 according to the invention. The bandwidthextender 18 comprises an input 20, spectral folding means 30, a noiseshaper 32, a combiner 34 and an output 22. The spectral folding means 30are coupled to the input 20 so that a narrowband audio signal that isreceived via the input 20 is supplied to the spectral folding means 30.The spectral folding means 30 are arranged for generating a spectrallyfolded audio signal 33 by spectrally folding at least part of thenarrowband audio signal. The spectrally folded audio signal 33 issupplied by the spectral folding means 30 to the noise shaper 32 and thecombiner 34. The noise shaper 32 is arranged for generating a shapednoise signal 35 by shaping a (white) noise signal 31 in accordance withat least part of the spectrally folded audio signal 33. The shaped noisesignal 35 is supplied by the noise shaper 32 to the combiner 34. Thecombiner 34 is arranged for combining the shaped noise signal 35 and thespectrally folded audio signal 33 into a wideband audio signal. Thebandwidth of the wideband audio signal is larger than the bandwidth ofthe narrowband audio signal. Finally, the wideband audio signal issupplied to the output 22.

Preferably, the noise shaper 32 is arranged for generating the shapednoise signal 35 by shaping the noise signal 21 in proportion to anenvelope of at least part of the spectrally folded audio signal 33. Sucha noise shaper 32 is shown in FIG. 3. FIG. 3 shows a block diagram of anembodiment of the noise shaper 32 for use in the transmission system 10according to the invention. The noise shaper 32 comprises an envelopeextractor 40 and a mixer 42. The envelope extractor 40 is arranged forextracting an envelope signal 41 from the spectrally folded audio signal33 and for supplying this envelope signal 41 to the mixer 42. The mixer42 is arranged for generating the shaped noise signal 35 by multiplyingthe noise signal 31 with the envelope signal 41.

FIG. 4 shows a block diagram of an embodiment of an envelopeextractor/Hilbert transformer 40 for use in the receiver 14 according tothe invention. The envelope extractor 40 preferably comprises a Hilberttransformer 40. The Hilbert transformer 40 comprises a cascade of a fastFourier transformer 50, a zeroing unit 52, an inverse fast Fouriertransformer 54 and a rectifier 56. The fast Fourier transformer 50transforms a time domain representation of the spectrally folded signal33 into a frequency domain representation thereof 51. The zeroing unit52 is arranged for zeroing the negative frequencies of this frequencydomain representation 51. The inverse Fourier transformer 54 is arrangedfor transforming the zeroed frequency domain representation 53 into atime domain representation thereof 55. The rectifier 56 is arranged forgenerating the envelope signal 41 by rectifying (i.e. by taking theabsolute value of) the zeroed time domain representation 55.

The frequency domain representation 51 of the spectrally folded audiosignal 33 is a complex signal. A real signal can be represented by a sumof sinusoids with different phases, amplitudes and frequencies. A fastFourier transform (FFT) is a sum of complex e-powers. Since a sine canbe described as a sum of two e-powers, one with a positive and one witha negative frequency, an FFT-spectrum is symmetrical with respect tozero (DC). By removing the negative frequencies in the zeroing means 52a spectrum of a complex signal (analytic signal) is created, which is asum of independent e-powers. When the absolute value is taken of theIFFT of this analytic signal (i.e. by the rectifier 56) the time-domainenvelope of the original input signal is found (due to the fact that theabsolute value of a complex e-power is equal to one).

FIG. 5 shows a block diagram of a prior art bandwidth extender 18 asknown from the paper “Speech Enhancement Via Frequency BandwidthExtension Using Line Spectral Frequencies” by S. Chennoukh, A. Gerrits,G. Miet and R. Sluijter in the proceedings of the 2001 IEEEInternational Conference on Acoustics, Speech, and Signal Processing,Salt Lake City, Utah, May 8-11, 2001. An input narrowband speech signal,which is sampled at 8 kHz and which is supplied to the input 20 of thebandwidth extender 18, is first up-sampled by two (i.e. zeros areinserted between successive samples) by an up-sampler 60. The obtainedupsampled signal 61 is sampled at 16 kHz. It has the same spectrum inthe lowband, i.e. 0-4 kHz, as the input signal and a folded version ofit in the highband, i.e. 4-8 kHz. This signal 61 is then low-passfiltered in a low-pass filter 62 to remove the folded version in orderto recover the same spectral properties as the input signal but sampledat 16 kHz. The low-pass filtered signal 63 is supplied to an LPCanalysis filter 70 and to a down-sampler 64.

The low-pass filtered signal 63 is down-sampled by two by the downsampler 64. Then, the resulting down-sampled signal 65 (which is sampledat 8 kHz) is modeled using an auto-regressive LPC model by means of aLPC analysis filter 66. The LPC analysis filter 66 derives LPCcoefficients 67 from the down-sampled signal 65. These LPC coefficients67 represent the spectrum of the input narrowband speech signal. Next,the narrowband LPC coefficients 67 are used by an envelope extender 68to extend the spectral envelope of the narrowband signal and to derivewideband LPC coefficients 69. This extension of the spectral envelope isperformed by mapping lowband line spectral frequencies (LSFs) towideband LSFs. This mapping is performed by means of a set of mappingmatrices.

Then, the output signal 63 of the low-pass filter 62 is analyzed usingan extended LPC analysis filter 70 on basis of the wideband LPCcoefficients 69. The analysis residual 71, that is expected to have aflat spectrum is thereafter successively down-sampled and up-sampled bytwo (i.e. put to zero every other sample) in spectral folding means 30.The successive down- and up-sampling realizes a spectral folding. Theresulting spectrally folded signal 73 is a sparse signal that is used toexcite a wideband synthesis filter 72 in order to obtain a widebandspeech signal that is supplied to the output 22 of the bandwidthextender 18. The wideband synthesis filter 72 operates on basis of thewideband LPC coefficients 69 and is the inverse of the analysis filter70.

FIG. 6 shows a block diagram of a part 74 of the bandwidth extender 18of FIG. 5, which part 74 has been adapted in accordance with theprinciples of the present invention. In addition to the part 74 as shownin FIG. 5 the part 74 as shown in FIG. 6 comprises a noise shaper 32,combiners 86 and 88 and several gain stages 80, 82, 84 and 92. The noiseshaper 32 comprises an envelope extractor 40, a mixer 42 and a band-passfilter 90. The combiner 86 is equivalent to the combiner 34 in FIG. 2.

The part 74 of the bandwidth extender 18 as shown in FIG. 5 can bedescribed by the part 74 as shown in FIG. 6 if gain factor a of the gainstage 92 is set to 0 and gain factor b of gain stage 84 is set to 2 andgain factor c of the gain stage 80 is set to 0. Please note that thegain stage 84 is not shown in FIG. 5.

The spectrally folded signal 73 comprises both lowband and highbandsignal components. As only the highband part of the spectrally foldedsignal 73 suffers from harmonic shifts it is not necessary to extractthe envelope of the lowband part. Consequently, the lowband signalcomponents are removed from the spectrally folded signal 73 by means ofthe gain stage 82 and the combiner 88. The amplitude of the spectrallyfolded signal 73 is equal to half the amplitude of the analysis residualsignal 71 (due to the properties of the spectral folder 30 whichcomprises a cascade of a down-sampler by two and an up-sampler by two).By means of the gain stage 82 the analysis residual signal 71 isattenuated and inverted by the gain stage 82 which applies a gain factorof −0.5 to the analysis residual signal 71. The resulting attenuatedanalysis residual signal 73 is thereafter added to the spectrally foldedsignal 73 by means of the combiner 88, thus removing the lowband signalpart from the spectrally folded signal 73. The resulting combined signal85 only comprises highband signal components and is supplied to theenvelope extractor 40 (similar to the signal 33 in FIG. 3).

The envelope extractor 40 extracts an envelope signal 87 from the signal85 and supplies this signal 87 to the mixer 42. The mixer 42 generates ashaped noise signal 91 (similar to signal 35 in FIG. 3) by multiplying aband-pass filtered noise signal 89 with the envelope signal 87. Theband-pass filtered noise signal 89 is obtained by filtering a (white)noise signal 31 by means of a band-pass filter 90 which only passes thehighband components of the noise signal 31.

The shaped noise signal 91 is amplified/attenuated by the gain stage 92and the resulting signal 93 is supplied to the combiner 86. Thespectrally folded signal 73 is amplified/attenuated by the gain stage 84and the resulting signal 95 is also supplied to the combiner 86. Inaddition, the analysis residual signal 71 is amplified/attenuated by thegain stage 80 and the resulting signal 81 is also supplied to thecombiner 86. The combiner 86 combines the signals 93, 95 and 81 byadding them into a combined signal 97 which is supplied to the widebandsynthesis filter 72.

In order for the wideband synthesis filter 72 to be able to reconstructthe lowband the following relation between the gain factors b and c mustbe valid: 0.5b+c=1. (These low-band signals are 100% correlated, thusamplitudes may be summed.) For the highband the following relationbetween the gain factors a and b must be complied with: (a/2)²+(b/2)²=1and hence a²+b²=4. (This is because here the signals are not correlatedand thus we have to sum energies.) When e.g. a=b={square root}2 thenc=1−½{square root}2≈0.3. However, tuning can provide other combinationsthat give better results than the computed ones. Satisfactory resultswhere obtained with the following setting: a=1.2, b=1.1 and c=0.45.

The bandwidth extender 18 may be implemented by means of digitalhardware or by means of software which is executed by a digital signalprocessor or by a general purpose microprocessor.

The scope of the invention is not limited to the embodiments explicitlydisclosed. The invention is embodied in each new characteristic and eachcombination of characteristics. Any reference signs do not limit thescope of the claims. The word “comprising” does not exclude the presenceof other elements or steps than those listed in a claim. Use of the word“a” or “an” preceding an element does not exclude the presence of aplurality of such elements. The invention can be implemented by means ofhardware comprising several distinct elements, and by means of asuitably programmed computer. In the device claim enumerating severalmeans, several of these means can be embodied by one and the same itemof hardware.

1. A bandwidth extender (18) for generating a wideband audio signal froma narrowband audio signal, wherein the narrowband audio signal has afirst bandwidth and the wideband audio signal has a second bandwidth,and wherein the second bandwidth is larger than the first bandwidth,wherein the bandwidth extender (18) comprises an input (20) forreceiving the narrowband audio signal and an output (22) for supplyingthe wideband audio signal, and wherein the bandwidth extender (18)further comprises spectral folding means (30) coupled to the input (20),wherein the spectral folding means (30) are arranged for generating aspectrally folded audio signal (33) by spectrally folding at least partof the narrowband audio signal, characterized in that the bandwidthextender (18) comprises a noise shaper (32) for generating a shapednoise signal (35) by shaping a noise signal (31) in accordance with atleast part of the spectrally folded audio signal (33), wherein thebandwidth extender (18) further comprises a combiner (42) for combiningthe shaped noise signal (35) and the spectrally folded audio signal (33)into the wideband audio signal.
 2. The bandwidth extender (18) accordingto claim 1, characterized in that the noise shaper (32) is arranged forgenerating the shaped noise signal (35) by shaping the noise signal (31)in proportion to an envelope of at least part of the spectrally foldedaudio signal (33).
 3. The bandwidth extender (18) according to claim 2,characterized in that the noise shaper (32) comprises an envelopeextractor (40) for extracting an envelope signal (41) from thespectrally folded audio signal (33), wherein the noise shaper (32)further comprises a mixer (42) for generating the shaped noise signal(35) by mixing the noise signal (31) with the envelope signal (41). 4.The bandwidth extender (18) according to claim 3, characterized in thatthe envelope extractor (40) comprises a Hilbert transformer.
 5. Thebandwidth extender (18) according to claim 4, characterized in that theHilbert transformer comprises a cascade of a Fourier transformer (50)for transforming a time domain representation of the spectrally foldedsignal (33) into a frequency domain representation thereof (51), means(52) for zeroing the negative frequencies of the frequency domainrepresentation (51), an inverse Fourier transformer (54) fortransforming the zeroed frequency domain representation (53) into a timedomain representation thereof (55), and a rectifier (56) for generatingthe envelope signal (41) by rectifying the zeroed time domainrepresentation (55).
 6. A receiver (14) for receiving a narrowband audiosignal from a transmission channel (16), wherein the receiver (14)comprises a bandwidth extender (18) as claimed in claim
 1. 7. A methodof generating a wideband audio signal from a narrowband audio signal,wherein the narrowband audio signal has a first bandwidth and thewideband audio signal has a second bandwidth, and wherein the secondbandwidth is larger than the first bandwidth, the method comprising:generating a spectrally folded audio signal (33) by spectrally foldingat least part of the narrowband audio signal, characterized in that themethod further comprises: generating a shaped noise signal (35) byshaping a noise signal (31) in accordance with at least part of thespectrally folded audio signal (33), combining the shaped noise signal(35) and the spectrally folded audio signal (33) into the wideband audiosignal.
 8. The method of generating a wideband audio signal from anarrowband audio signal according to claim 7, characterized in that theshaped noise signal (35) is generated by shaping the noise signal (31)in proportion to an envelope of at least part of the spectrally foldedaudio signal (33).
 9. The method of generating a wideband audio signalfrom a narrowband audio signal according to claim 8, characterized inthat the method further comprises: extracting an envelope signal (41)from the spectrally folded audio signal (33), generating the shapednoise signal (35) by mixing the noise signal (31) with the envelopesignal (41).